本文提出了一个简单而有效的框架蒙版,该框架将新提出的掩盖自distillation纳入对比的语言图像预处理中。掩盖自distillation的核心思想是将表示从完整的图像提取到蒙版图像预测的表示形式。这种合并享有两个重要的好处。首先,掩盖的自我验证目标是本地贴片表示学习,这与视觉对比度的互补,专注于与文本相关的表示。二,掩盖的自我验证也与视觉语言对比符合训练目标的视野对比是一致的。视觉编码器用于功能对齐,因此能够学习本地语义从该语言中获得间接监督。我们提供了专门设计的实验,并进行了全面的分析,以验证这两个好处。从经验上讲,我们表明,当MaskClip应用于各种具有挑战性的下游任务时,可以在线性探测,填充和零拍摄中取得卓越的结果,并在语言编码器的指导下取得了卓越的结果。
translated by 谷歌翻译
本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
预测不同托卡马克人的破坏是要克服的巨大障碍。未来的Tokamaks在高性能排放时几乎无法忍受中断。很少有高性能的破坏排放几乎无法构成丰富的训练集,这使得当前数据驱动的方法难以获得可接受的结果。能够将在一个Tokamak训练的中断预测模型转移到另一种训练的机器学习方法以解决该问题。关键是一个包含特征提取器的破坏预测模型,该模型能够在Tokamak诊断数据中提取常见的破坏前体痕迹,并具有可转移的破坏分类器。基于上面的问题,该论文首先提出了专门针对Tokamaks上的普通诊断中的破坏前体特征而设计的深融合功能提取器,该特征是根据当前已知的破坏前体,为可转移模型提供了有希望的基础。通过与J-Text上的手动特征提取进行比较,可以证明融合功能提取器。基于在J-TEXT上训练的功能提取器,将中断预测模型转移到East数据中,仅来自East实验的20次放电。该性能与经过1896年出院的模型相当。从其他模型培训方案之间的比较,转移学习表明了其在预测不同托卡马克人的破坏方面的潜力。
translated by 谷歌翻译
如何学习一个促进所有面部分析任务的通用面部表示?本文对此目标进行了一步。在本文中,我们研究了面对面分析任务的预先训练模型的转移性能,并以视语言方式为一般面部代表学习学习的框架,称为Farl。一方面,该框架涉及从图像文本对学习高级语义含义的对比损失。另一方面,我们提出通过添加掩蔽图像建模来同时探索低级信息以进一步增强面部表示。我们对Laion-face进行预训练,一个包含大量面部图像文本对的数据集,并评估在多个下游任务上的表示功能。我们表明Farl与以前的预先训练的模型相比,Farl实现了更好的转移性能。我们还验证了低数据制度的优势。更重要的是,我们的模型在面部分析任务上超越了最先进的方法,包括面部解析和面部对齐。
translated by 谷歌翻译
在本说明中,我们调查我们如何从少量其条目重建大矩阵的最佳排名 - $ rysimation。我们表明即使数据矩阵是完整等级并且不能通过低秩矩阵近似近似的数据矩阵,其最佳低秩近似可能仍然可以从少量其条目中可靠地计算或估计其最佳的低秩近似。与统计观点特别相关:数据矩阵的最佳低秩近似通常比自身更具感兴趣,因为它们捕获更稳定的数据生成模型的更加稳定和多个可再现特性。特别是,我们调查了两种不可知论者方法:第一个基于光谱截断;第二个是投影梯度基于下降的优化过程。我们认为,虽然第一种方法是直观且合理有效的,但后者通常具有较高的性能。我们表明错误取决于矩阵对低等级的关闭程度。提出了理论和数值证据,以证明所提出的方法的有效性。
translated by 谷歌翻译
矩阵分解(MF)已广泛应用于建议系统中的协作过滤。它的贝叶斯变体可以得出用户和项目嵌入的后验分布,并且对稀疏评分更强大。但是,贝叶斯方法受到其后验参数的更新规则的限制,这是由于先验和可能性的结合。变量自动编码器(VAE)可以通过捕获后验参数和数据之间的复杂映射来解决此问题。但是,当前对合作过滤的VAE的研究仅根据明确的数据信息考虑映射,而隐含嵌入信息则被忽略了。在本文中,我们首先从两个观点(以用户为导向和面向项目的观点)得出了贝叶斯MF模型的贝叶斯MF模型的较低界限(ELBO)。根据肘部,我们提出了一个基于VAE的贝叶斯MF框架。它不仅利用数据,还利用嵌入信息来近似用户项目联合分布。正如肘部所建议的那样,近似是迭代的,用户和项目嵌入彼此的编码器的交叉反馈。更具体地说,在上一个迭代中采样的用户嵌入被馈送到项目端编码器中,以估计当前迭代处的项目嵌入的后验参数,反之亦然。该估计还可以关注交叉食品的嵌入式,以进一步利用有用的信息。然后,解码器通过当前重新采样的用户和项目嵌入方式通过矩阵分解重建数据。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译